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ABSTRACT 


This  report  deals  with  design  principles  for  iterative  computational  networks. 
Such  computational  networks  are  used  for  performing  repetitive  computations 
which  typically  are  not  data-dependent.  Most  of  the  signal  processing 
algorithms,  like  FFT  and  filtering,  belong  to  this  class. 

The  main  idea  in  this  report  is  the  development  of  mathematical  notation  for 
expressing  such  designs.  This  notation  captures  the  Important  features  and 
properties  of  these  computational  networks,  and  can  be  used  for  analyzing, 
for  designing,  and  objectively  evaluating  computational  networks. 
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1.  INTRODUCTION 


Tho  central  point  of  this  report  Is  the  application  of  a precise  mathematical 
notation  to  express  computational  networks.  This  notation  captures  the 
concepts  of  arithmetic  operations  (such  as  addition  and  multiplication)  and  of 
timing  (e.g.,  delaying).  Once  a design  is  expressed  by  means  of  such  a 
mathematical  notation,  it  can  be  evaluated  objectively  against  a predefined  set 
of  design  objectives,  like  performance  and  cost. 

The  next  section  defines  the  design  objectives  that  guide  the  examples  in  this 
report.  Obviously,  other  sets  of  design  objectives  may  be  used  without 
deviating  from  the  spirit  of  the  report. 

Section  3 deals  with  the  implementation  of  a Finite  Impulse  Response  {FIR) 
filter,  a typical  signal  processing  problem.  In  that  section,  several  designs  are 
suggested  and  evaluated  objectively,  and  the  mathematical  notation  to  express 
them  is  developed  in  parallel. 

Throughout  this  report  the  term  "design"  means  the  structure/architecture  of 
the  computational  network.  This  term  is  the  hardware  equivalent  of  the 
software  term  "algorithm". 

In  that  section  we  consider  first  a design  that  follows  closely  the 
mathematical  definition  of  the  FIR  filter.  Later  this  design  Is  transformed 
several  times  In  order  to  improve  it  with  respect  to  the  predefined  design 
objective. 

In  that  section  the  graphic  representations  of  these  designs  are  the  source  of 
Intuition,  and  their  mathematical  representations  are  mainly  a means  for 
verifying  the  correctness  of  the  various  transformations  of  the  design. 

In  section  4 the  same  technique  and  the  same  notation  are  applied  to 
multiplication  of  polynomials.  In  this  section  the  mathematical  representation 
is  the  guiding  force,  and  the  graphic  representations  are  used  only  for 
demonstration. 

In  section  5 the  same  technique  is  used  for  division  of  polynomials  and  for 
simultaneous  multiplication  and  division  of  polynomials.  In  this  section  the 
mathematical  notation  is  the  only  tool  used,  and  the  graphic  drawings  are 
used  for  a demonstration  only. 
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In  section  6 the  same  technique  is  applied  to  synthetic  aperture  radar  ( SAR ) 
processing.  Several  designs  • which  result  directly  from  the  mathematical 
definition  and  the  notation  arc  considered  and  evaluated. 

It  is  our  conviction  that  this  mathematical  notation  Is  a very  powerful  tool, 
complementing  the  intuition  which  is  based  on  conventional  graphic 
representation. 
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2.  THE  DESIGN  GOALS 


In  order  to  achieve  an  optimal  design,  it  is  necessary  to  define  the  design 
objectives.  The  following  are  typically  considered  to  be  important: 

(fl)  Correctness  and  accuracy 

(f>)  High  computation  rate 

(r)  Low  delay 

{<i)  Low  parts  count 

(<•)  Modularity,  simplicity,  etc. 

(/)  Low  power 

(g)  Small  size 

(A)  Low  cost 


Obviously,  this  is  only  a partial  list.  For  different  applications  the  relative 
weights  of  these  objectives  may  vary.  It  is  generally  accepted  that  (a)  is  the 
most  important,  even  though  we  seem  to  have  evidence  that  this  is  not  * 

always  the  case. 

In  some  cases  (A)  is  the  dominant  factor,  in  others  it  is  (/)  and  (g).  In  this 
report,  we  consider  (a)  through  (e),  in  that  priority  order. 
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3.  THE  FIR-FILTER  EXAMPLE 


Consider  the  Finite  Impulse  Response  (FIR)  filter  defined  by 

N 

yn  ' £ *1  xn-l 
i=l 


(1) 


This  is  a nonrecursive  filter  of  the  A/th  order.  Each  output  (Y)  is  a weighted 
average  of  the  previous  N inputs  (X). 

Typically,  the  X sequence  is  a time  series,  and  the  {x.}  are  available 
sequentially,  starting  at  x^,  continuing  through  xg  and  x up  to  *m,  where 
typically  m»N. 

The  "edge-effect"  at  the  initialization  may  be  Ignored.  It  is  typical  to 
define  Xj  * 0 for  i $ 0. 

TIIE  Z OPERATOR 

Let  Z be  the  delay  operator  such  that  Zx^  = x^  ^ (2) 

In  a system  controlled  by  a central  master  clock,  this  Z operator  may  be 
implemented  by  a simple  register. 

Similarly,  Zn  is  defined  by  z”x(.  = xj  n (3) 

The  Zn  can  be  implemented  by  an  n-stage  shift  register,  which  is  a FIFO 
(queue). 

We  will  use  the  following  properties  of  the  Z operator: 

(0  ZnF(x,y)  = FtZ^x.Z^y)  for  all  n,  and 

(it)  if  C is  a constant  then  Z nC  = C for  all  n. 

Negative  values  of  n mean  prediction  by  |n|  steps  into  the  future.  Since 
prediction  of  external  input  is  not  easy  to  implement,  it  is  advisable  to  use 

only  n^O  when  applying  the  Zn  operator  to  the  input. 
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TIIE  FIR-FILTER  IMPLEMENTATION 

N 

Tho  expression  y„  = Z aj  Vi  U> 

1=1 

/V 

may  also  be  written  as  y = V a Z*x  14) 

1=1 

By  using  operator-calculus  notation.  (4)  may  be  wrl  a as 

N 

Y = ( Z aj  ) X (5) 

i=l 

For  A/=4  this  means  Y = (OjZ  + o2Z2  + CgZ3  + c4Z4)  X (6) 

which  can  be  implemented  by  the  network  shown  in  figure  (Fl). 


Figure  (Fl):  The  implementation  of  (6). 


The  circles  in  figure  (Fl)  with  the  a.'s  represent  the  multiplications  by  the 
constants  written  inside  them. 
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Checking  this  network  against  the  design  objectives  reveals  that 

(fl)  Correctness:  The  correct  expression  is  indeed  computed,  since  the 

values  at  P P P and  Pin  are  Zx  , Z2x  . Z3xm  and  Z4x  . 

'iry  iu  n n n n 

respectively. 

(/>)  Computation  rate:  The  computation  rate  is  the  reciprocal  of  the 
computation  period,  which  is  the  time  needed  for  one  multiplication 
and  for  adding  N quantities. 

(O  Delay:  The  delay  is  one  Z-period  plus  the  computation  period. 

It  is  not  simple  to  quantify  the  parts  count,  (rf),  and  the  modularity 
objective,  (f). 

However,  the  parts  count,  (rf),  can  be  improved!  Note  that  the  values  at 
P y f* 2 • P)  and  P4  are  all  equal  to  Zx^.  Therefore  these 
points  could  be  unified.  Similarly,  Py  P6  and  Py  could  be  unified,  and  so 
can  P g and  Py. 

This  does  not  change  (0),  (6)  and  (c),  but  it  does  Improve  (d).  The  new 
network  is  shown  in  figure  (F2). 


Figure  (F2):  The  Improved  implementation  of  (6). 


Hence,  the  parts  count,  objective  (rf),  is  improved  by  the  elimination  of  8 
delay  operators,  or  Qjf j in  the  general  case.  The  modularity,  objective  (#),  is 
also  improved,  as  seen  from  the  repeated  modules,  marked  by  dashed  lines  in 
figure  (F2). 


s.  _ 
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IMPROVING  THIS  DESICN 

The  A/-lnput  summation  Is  the  Achilles  heel  of  this  design,  mainly  because  it 
does  not  comply  with  the  modularity  requirement. 

In  addition,  the  direction  of  the  information  flow  from  the  repeating  modules 
into  the  summation  is  perpendicular  to  the  direction  in  which  these  modules 
are  arranged.  This  may  cause  problems  with  the  geometry  of  the  wiring, 
both  in  LSI  and  discrete  (/C’s)  implementations,  and  also  on  and  between 
printed  circuit  boards. 

In  addition,  the  required  number  of  output  lines  from  any  grouping  of  a set 
of  several  modules  is  proportional  to  their  number,  and  this  may  pose  severe 
problems  for  implementation  at  any  scale. 

The  way  to  Implement  N -input  summation  is  by  N-i  additions.  Breaking  the 
summation  operation  into  N-l  additions,  and  dividing  them  between  the 
modules,  as  shown  in  figure  (F3),  alleviates  this  problem. 


The  network  shown  in  figure  (F3)  is  composed  of  N identical  modules.  This 
is  a great  improvement  for  the  design  objective  (*),  modularity. 

The  leftmost  adder,  the  one  in  the  first  module  (with  Oj),  does  not  perform 
any  real  addition  operation,  because  one  of  its  Inputs  always  has  the  value  of 
zero.  The  only  purpose  of  including  it  in  this  network  is  to  Improve  the 
modularity.  Obviously,  in  discrete  Implementations,  there  is  no  need  to 
Include  It.  Eliminating  it  trivially  improves  the  performance  and  the  parts 
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count.  On  the  other  hand,  in  Integrated  Implementations,  such  as  LSI,  having 
it  there  is  a small  price  for  reducing  the  number  of  different  modules 
required. 

This  implementation  is  represented  by 

N 

Y = ( Z aj  Z‘  ) X (5) 

i=l 


In  order  to  improve  the  delay  involved  in  this  computation,  notice  that 

Z-'Y^ajZ'^Jx^a^^X  (7) 

i=l  i=0 


Only  nonnegative  powers  of  Z are  used  for  the  input  values  (X).  The 
"prediction"  (Z  *)  is  applied  only  to  the  output  (Y).  It  means  that  at  the  nth 
cycle  (i.e.,  when  x n is  given)  the  next  Y value,  yn+J,  is  available. 

This  is  easy  to  observe  from  rewriting  (6)  as 


>Vl=alxn+a2Vl+a3*n. 


Z+a4xn-3 


and  rewriting  (7)  as 


(8) 


yn  = * 1 V 1 + a2  V 2 + a3  V3  + a4  V 4 
Uoth  (7)  and  (9)  yield  the  implementation  shown  in  figure  (F4). 


(9) 


Figure  (Fd):  The  implementation  of  (7). 


10 


t 


I 

* 


I 


I 


Note  that  In  figure  (F4)  the  leftmost  adder  (in  the  first  module)  Is  redundant, 
as  mentioned  before.  So  is  the  rightmost  delay  (in  the  last  module)  which 
does  not  tax  the  performance.  It  also  may  be  eliminated  In  discrete 
implementations,  but  In  Integrated  implementations  it  is  not  advisable  to  do 
so. 


ABOUT  NOTATION 

Let  us  introduce  another  notation,  PJ^X.Y^,  representing  the  multiplication  of 
X and  Y.  The  purpose  of  this  notation,  compared  with  the  usual  XY  notation, 
is  to  make  the  multiplication  operation  explicit  in  the  notation,  and  to 
distinguish  between  it  and  the  application  of  operators. 

Note  the  difference  between  the  following  expressions: 

N 

yn  * £ n (»,.  (1) 

i=l 

»■  [in  M)  ]*  <*> 

i=i 

and  the  following  expressions: 

N-i 

yn+l  = £ FI  (ai+j*  xn-i) 

1=0 

N- 1 

z'ly  = [zn  (ai+1.  z1)  ] x do 

i=0 

The  first  ones,  which  require  unnecessary  delay,  have  the  summation  range 
of  ll,A/],  which  is  the  "standard"  way  for  mathematicians  for  expressing  a 
set  of  N objects,  whereas  the  last  two  use  the  range  [0,N-1),  which  seems  to 
bo  less  "convenient",  but  yields  better  delay  characteristics. 

This  illustrates  the  need  to  beware  of  "mental  traps"  that  may  be  caused  by 
notation. 
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IMPROVING  TIIF  OPERATION  RATE 


The  ni<iJor  deficiency  of  all  the  networks  considered  so  far  is  their  operation 
rate,  objective  (b).  As  noted  before,  the  operation  period  cannot  be  shorter 
than  the  time  required  for  multiplication  and  addition  of  N quantities. 

Even  when  the  multipliers  are  arranged  such  that  the  multiplication  time 
overlaps  the  addition  time,  the  addition  must  still  propagate  through  N (or 
N - 1 ) stages. 

.Since  N may  be  very  large,  it  is  desirable  to  eliminate  the  need  for  this  long 
propagation.  This  may  be  achieved  by  using  the  "carry-save"  idea,  which 
uses  extra  delays  in  order  to  improve  the  data  rate.  In  our  problem  we 
Introduce  delay  units  between  the  modules,  which  delay  the  output  by  N 
cycles  but  improve  the  computation  rate. 

The  resulting  network  is  shown  in  figure  (F6). 


Figure  (F5):  Implementing  the  "carry-save"  idea. 


Note  that  the  network  in  figure  (F5)  is  implemented  by  using  the  very  same 
modules  as  in  figure  (F4)  and  additional  delays. 

Since  three  delays  were  added  (for  A/*4),  the  result,  which  was  Z_1Y  in 

figure  (F4),  Is  delayed  by  Z3  and  is  now  Z3(Z"!Y)  ■ ZZY,  and  Z^~2 Y In 
general. 
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The  rigorous  proof  that  the  output  Is  correct  Is  its  computation.  Let  Sj  denote 
the  output  of  such  a network  as  (F6),  with  J modules.  The  output  of  (F5)  is 
therefore  S = . We  will  prove  that  in  general  the  output  of  an  N-modules 

network  is 


* ■ • [ z *N-‘  n Ov  *21-2)  ] x 


From  the  structure  of  the  network  and  the  modules,  as  shown  in  figure  (F6), 
we  got  the  following  relation: 


•vzsj-.*n(»,z2J'2)x  <iz> 

Equation  (11)  is  proved  by  Induction,  starting  from  SQ  = 0 . Assume  that  it 
holds  for  SN  i,  and  use  (12)  to  evaluate  S^s 

sn  * z sn-i  + n (v z™'2) x * 

■ z [ e’zN-'-‘  n (a,,  ] X a n («,  *"^  X . 

i=l 

■ [ z’zN-‘  n (a,  :«)(n(a,z2M)jx. 


[ZxN''n(at.X21-2)]x  O.E.D. 


If  the  proof  seems  too  rigorous,  one  can  obtain  (11)  directly  by  numbering 
the  modules  from  left  to  right.  In  the  ith  module,  a^  is  used,  multiplied  by 

7.?J~?‘X  (X  at  module  I,  Z2X  in  module  2,  Z4X  in  module  3 and  Z6X  in 
module  4)s  the  product  is  then  delayed  by  ZN~l  (here,  Z3  for  module  1,  Z2 

for  module  2,  etc.).  Hence,  the  output,  S,  is  the  sum  of  these  products 
?J-Z  N-i 

" X,  each  delayed  by  Z , as  indicated  by  (11). 


Direct  methods,  compared  with  rigorous  proofs,  are  simpler  and  more 
Intuitive,  but  require  caution.  Intuition  Is  known  to  have  been  misleading  on 
occasions. 


Equation  (11)  can  be  simplified  to  yield 

* ■ [ e n (.,  z21-2)  ] * .[  z n (»,.  zNt|-2)  ] * - 

i=i  i=i 

= [ ZN'Z  £ f]  ( at,  Z1  ) ] X = ZN‘2  Y (14) 

1=1 


Check  this  network  against  the  design  objectives: 

(fl)  Correctness:  The  correct  expression  is  Indeed  computed,  as  shown  by 
equation  (14). 

(^)  Computation  rate:  The  computation  period  is  now  the  time  required 
for  a single  multiplication  followed  by  a single  addition,  independent 
of  the  magnitude  of  N.  Since  it  is  easy  to  overlap  the  execution  of 
the  multiplication  and  the  addition,  we  do  not  attempt  to  separate 
them  even  though  this  may  slightly  Improve  the  computation  period 
and  the  computation  rate. 

(c)  Delay:  The  computation  delay  is  equal  to  (A/-2)  computation  cycles, 
as  shown  by  (14). 

( <1 ) Parts  Count:  The  same  number  of  adders  and  multipliers  as  before  is 
needed.  However,  3 delays  are  needed  in  each  module.  Hence,  the 
total  parts  count  is  higher  (i.e.,  worse)  than  before. 

(r)  Modularity:  The  modularity  is  not  as  good  as  it  is  in  the  network 
shown  in  figure  (F4),  which  includes  only  components  Included  in 
the  repeated  modules. 

In  order  to  improve  the  modularity,  we  merge  the  new  delays  into  the  old 
modules.  In  order  not  to  introduce  additional  delays,  we  Include  in  each 
module  the  delay  which  is  on  its  right  on  the  "upper"  line,  and  the  delay 
which  is  on  its  left  on  the  "lower"  line.  Hence,  the  network  Implementation 
now  is  composed  of  N modules,  each  as  shown  in  figure  (F8),  without  the 
need  for  any  additional  components. 
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By  using  a network  which  consists  of  N modules  as  shown  In  figure  (F6), 
the  rate  Is  the  best  which  can  be  achieved  (without  separating  the 
multiplication  from  the  addition)  and  the  delay  is  proportional  to  N. 

ANOTHER  LOOK 

At  this  time  we  would  like  to  ask  if  the  reader  has  noticed  that  a very 
Important  design  decision  was  made  without  any  Justification  or  even 
discussion.  Please  take  a moment  and  recall  what  has  been  done  so  far,  and 
look  for  that  Important  design  choice  which  was  made  as  If  no  alternative 
existed. 

This  design  decision  is  the  scqucntialization  of  the  summation-operator.  We 
introduced  It  as  a left-to-right  sequence  of  adders  without  considering  other 
possibilities. 

We  can  use  a tree-structure,  with  log^N  depth.  Here  the  carry  chain  is  only 
log.pN  long,  which  is  better  than  N,  but  still  might  be  too  long.  The  same 
"carry-save"  approach  may  be  used  again,  by  using  the  delay  operation,  Z, 
between  every  pair  of  successive  adders. 

How  docs  this  design  check  against  the  objectives? 

(<*)  Correctness:  The  correct  expression  is  Indeed  computed,  as  shown 
before. 

(*)  Computation  rate:  The  rate  is  optimal.  As  before,  we  wish  not  to 
split  the  addition  from  the  multiplications. 


■ *7. 


15 


(r)  Onlay:  The  delay  is  only  log^A/. 

(rf)  F'arts  count:  The  total  number  of  adders  required  for  adding  N 
numbers  is  N-i,  whether  they  are  arranged  in  linear  order  or  in  a 
tree  structure.  Hence  no  change  In  the  number  of  adders  is  needed. 

(<•)  Modularity;  The  adders'  binary  tree  is  again  perpendicular  to  the  data 
flow  and  may  impose  a severe  geometrical  problem. 

Hy  using  the  modules  shown  in  figure  (F7),  one  can  build  this  network  by 
having  N type-A  modules  arranged  in  a linear  order  and  (.N-i)  type-B 
modules  arranged  in  a binary  tree  structure. 


Figure  (F7):  Modules  for  the  tree  implementation 


ANI»  ANOTHER  LOOK 

We  have  considered  the  left-to-right  and  the  binary  tree  arrangements.  Let 
us  consider  next  the  rlght-to-left  option.  At  first,  it  does  not  appear  to  be 
different  from  the  left-to-right,  but  this  point  is  worth  verifying. 

I.et  us  look  at  the  network  shown  in  figure  (F4),  with  the  direction  of  the 
addition  reserved.  The  resulting  network  is  shown  in  figure  (FB). 


Note  that  the  networks  shown  in  figures  (F4)  and  (F0)  are  identical,  and 
therefore  the  latter  suffers  from  the  same  problem  that  the  former  does. 


> • • • 
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Figure  (F8):  The  right-to-left  addition. 


The  very  same  "carry-save"  idea  can  be  used  again,  by  adding  delays.  This 
results  in  the  network  shown  in  figure  (F9),  which  is  similar  to  (F5). 


i 1 i 1 i 1 i 1 


L. _i  l_ I 


Figure  (F9):  Right-to-left  addition  with  delays. 

This  new  network  also  has  to  be  checked  against  the  design  objectives. 

Starting  from  (a),  the  correctness,  we  compute  the  value  of  the  outputs,  by 
using  the  same  technique  of  numbering  the  modules  from  left  to  right.  Now 
we  get 

s 8 [ z z‘-1  n Ov  z21-2)  ] x (i5> 

i=i 

Note  that  this  is  very  similar  to  (11),  except  that  the  output  of  the  fth 
module  Is  delayed  now  by  Z*"1  instead  of  by  Z^~‘  as  before,  when  It  was 
added  to  the  right. 
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The  simplification  of  (15)  yields 

» ■ [ I n (v  z21-2)  j ■ t I n (v  z31'3)  ] x * 

1=1  1=1 

* [ *-3  z n (•,.  *3i)  ] x del 

i=i 


This  Is  obviously  not  the  desired  Y.  Therefore,  the  network  shown  in  figure 
(F9)  dors  not  perform  the  correct  computation. 

Why  does  the  very  same  approach  that  worked  so  well  in  the  network 
shown  in  figure  (F6)  fail  now? 

The  reason  is  very  simple  indeed,  in  both  cases  the  delays  between  the  adders 
(on  the  "lower"  line)  are  needed  in  order  to  make  the  computation  period 
independent  of  N.  The  purpose  of  the  other  delays  (on  the  "upper"  line)  is  to 
compensate  for  the  delays  on  the  "lower"  line  such  that  the  addition  is 
performed  coherently. 

Since  In  the  left-to-right  network  (F5)  data  flows  on  both  lines  (the  "lower" 
and  the  "upper")  in  the  same  direction,  the  same  delays  have  to  be  Introduced 
in  both,  to  keep  the  data  "in-step". 

However,  in  the  right-to-left  network  (19)  data  flows  on  these  lines  in 
opposite  directions.  Hence,  in  order  to  compensate  for  a delay  on  the  "lower" 
line,  data  should  be  accelerated  on  the  "upper"  line.  Since  Z is  used  on  the 

,i  * 

"lower",  Z should  be  used  on  the  "upper". 

It  Is  unfortunate  that  the  Z’1  operation  is  a prediction  which  we  cannot 
implement  In  the  general  case.  However,  in  this  case  each  Z~*  happens  to 
follow  a Z,  such  that  each  cancels  the  effect  of  the  other. 

Let  us  replace  on  the  "upper"  line  all  the  intermodule  Z operators  by  Z"1. 
This  cancels  the  effect  of  the  intramodulc  Z operators,  such  that  no  delays  are 
needed  on  this  line. 


. 
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Figure  (F10)  shows  the  modified  network. 


Figure  (F10):  The  modified  right-to-left  network. 


Again,  the  new  design  has  to  be  checked  against  all  the  design  objectives. 
Starting  with  (a),  the  correctness,  we  get 

* ■ £ n (v  x) . r*  £ i*  n («,  x) . 

1=1  i=l 

/V 

= [ z_1  z n h-  z‘)  ] x = z_1  y 

1=1 


This  proves  the  correctness  and  also  shows  that  there  Is  no  delay  whatsoever. 
We  also  know  that  the  computation  period  is  minimal,  since  It  is  equal  to  the 
longest  "atomic"  operation.  The  parts  count  is  lower  than  in  any  other 
design,  and  the  network  is  modular. 

Based  on  the  above,  this  design  is  optimal  with  respect  to  correctness,  (a), 
computation  rate,  (/>),  and  delay,  (c),  and  it  also  scores  highly  in  the  parts 
count,  (rf),  and  the  modularity,  (f),  categories. 

An  alternative  way  to  draw  this  network  is  shown  in  figure  (FI  1 ).  Note 
that  the  addition  Is  performed,  again,  in  the  left-to-right  direction,  because 
the  order  of  the  a.’s  is  reversed. 
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APPLYING  THE  Z-NOTATION  TO  DESIGN  EVALUATION 

Wo  will  show  that  the  Z-notation  can  be  used  for  the  evaluation  of  all  the 
networks  shown  before,  from  figure  (FI)  to  figure  (F10).  We  also  claim  that 
this  transformation  can  (and  should)  be  performed  without  the  aid  of  figures 
and  intuition. 

Lot  us  review  the  systems  which  we  have  discussed  so  far. 

System  (A)  is  the  one  which  resulted  directly  from  the  definition,  and  is 
shown  in  figure  (FI)  through  figure  (F3).  Its  representation  is 

N 

System  (A):  Y = [ X!  FI  (ai*  z*)  ] x (18) 

1=1 


Using  our  experience  with  this  kind  of  network,  we  noted  that  one  delay 
could  be  saved,  and  we  transformed  this  network  into  system  (B),  which  is 
the  one  shown  in  figure  (F4).  Its  representation  is 


N- 1 

System  (B):  Y = [ Z £ f]  (ai+l’  z‘)  ] x (19) 

1=0 


Then,  in  order  to  improve  the  rate,  we  further  transformed  the  network  into 
system  (C),  the  one  shown  in  figure  (F5),  whose  representation  is 

Y = [ Z"(N_2)  £ ZN_i  J~I  (a.,  Z21"2 
1=1 


System  (C): 


(20) 
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Then  we  Introduced  the  right-to-left  addition  and  were  able  to  transform  this 
system  Into  system  (D),  the  one  shown  in  figure  (F10),  whose  representation 
is 

N 

System  (D)s  Y = 7.  £ Z1'1  f]  (a.,  x)  (21) 

i=  1 

Next,  wc  compare  and  evaluate  these  systems,  by  using  their  representations, 
without  referring  to  the  figures. 


(a)  Correctness:  From  the  representation  above  it  is  evident  that  all  of 
these  systems  perform  the  correct  computation. 

(b)  Rate:  Both  (A)  and  (B)  require  adding  N quantities  at  once. 
Therefore,  their  computation  period  is  equal  to  the  time  required  for 
a multiplication  followed  by  the  addition  of  N numbers,  where  (C) 
and  (D)  require  only  the  time  needed  for  a multiplication  and  a 
single  addition. 

(c)  Delay:  In  (A)  y^  is  available  in  the  same  cycle  as  x^.  We  use  this 
for  delay  reference,  and  denote  it  as  zero  delay. 


In  (B)  the  entire  expression,  on  the  right-hand  side,  is  multiplied  by 
the  delay  Z.  This  means  that  the  output  of  the  network  that 
computes  this  expression  has  to  be  delayed  one  cycle  in  order  to  have 
the  same  delay  as  in  (a),  the  zero  delay.  Hence,  without  this 
additional  delay,  the  output,  Y,  is  advanced  by  one  cycle,  and  is  equal 
to  -1  cycle.  This  means  it  is  earlier  than  (A)  by  one  cycle. 

On  the  other  hand,  (C)  requires  Z ^ in  order  to  achieve  the  same 
delay.  Since  this  is  not  feasible  to  implement,  the  Y computed  by 
this  network  is  delayed  by  ( N-Z ) cycles,  compared  with  (A). 


(D)  has,  obviously,  the  same  delay  as  (B).  Thus,  (D)  also  is  earlier 
than  (A)  by  one  cycle. 

In  summary,  in  the  general  case,  the  delays  are 

System  Implementation  A B C D 


L 


Delay  (in  cycles) 


0 


-1 


A/-2 


-1 
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However,  even  though  both  (B)  and  (D)  have  the  same  delay  In 
cycles,  (D)  has  a smaller  delay  since  its  cycle  is  shorter.  Hence,  in 

this  implementation,  y , is  available  a shorter  time  after  x is 

'*+1  n 

given,  compared  with  (B). 

(rf)  Ports  count:  The  modular  Implementations,  including  the  additional 
delays  and  the  additional  adders  (which  may  be  required  on  either 
end  of  the  network  in  order  to  achieve  the  modularity),  are  compared 
with  each  other. 

All  four  implementations  require  N multipliers,  and  N adders  (or  N 
multiply-&-add  units).  They  differ  only  in  the  delay  requirements. 

Both  (A)  and  (B)  require  N delays  for  X. 

(C)  requires  ZN  delays  for  X,  and  A/  delays  for  the  partial  sums  of 
the  products.  These  delay  units  require,  in  general,  more  capacity 
(bits)  than  for  delaying  X,  especially  if  fixed  point  arithmetic  is 

used. 

(I>)  requires  N delays  for  the  partial  sums  of  the  products. 

Modularity  and  simplicity:  All  four  implementations  are  equally 
modular,  with  the  same  level  of  complexity. 

The  rating  of  these  systems  is  summarized  in  the  following  table.  5 > T 
means  that  5 is  better  than  T. 


(fl)  Correctness 

(A) 

= 

(B) 

(C) 

s 

(D) 

(/»)  Data  rate 

(C) 

= 

(D) 

> 

(A) 

= 

(B) 

(r)  Delay 

(D) 

> 

(B) 

> 

(A) 

s 

(C) 

(d)  Part  count 

(A) 

= 

(B) 

> 

(D) 

> 

(C) 

(r)  Modularity 

(A) 

r 

(B) 

(C) 

B 

(D) 

This  shows  that  (I))  is  the  best  design,  if  performance  is  the  major  objective, 
but  (B)  is  the  best  design  if  the  parts  count  is  the  major  one. 


'4 


> 
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4.  MULTIPLICATION  OF  POLYNOMIALS 


Tho  previous  example,  the  FIR  filter,  was  designed  by  using  Intuition  to 
operate  on  computational  networks  represented  by  drawings.  The  Z-notation 
could  be  used,  but  is  less  intuitive. 

Next  we  compute  multiplication  and  division  of  polynomials,  and  design 
computational  networks  to  implement  these  operations.  However,  now  we  use 
the  Z-notation  for  the  design  of  the  networks,  and  use  diagrams  only  to 
demonstrate  the  design. 

TIIE  PROBLEM  OF  MULTIPLICATION  OF  POLYNOMIALS 

Let  A(0  and  X(f)  be  polynomials  in  f,  of  degrees  c and  m,  respectively: 

c m 

A(t)  = £ a,  t1  ; X(t)  = £ x j t1  (22) 

i=0  i=0 

Let  Y(f)  be  the  product  polynomial  of  A(<)  and  X(f). 

m+c  c m 

Y(t)  = £ y,  t1  ■ (£  a4  t1)  (2  Xj  t1)  (23) 

i=0  i=0  i=0 

By  equating  the  coefficients  of  tl  we  get 

c 

yn  s a4  xn  l ^Xj  s 0 for  i < 0 and  1 > m)  (24) 

1=0 

We  are  interested  in  finding  the  coefficient  set  of  the  polynomial  Y(0,  from 
the  given  coefficient  sets  of  A(f)  and  X(/).  We  are  not  interested  in 
evaluating  any  of  these  polynomials  for  particular  values  of  I. 

In  many  applications  A (f)  is  a fixed  polynomial,  and  X(f)  is  a variable  one. 
The  computation  problem  is  to  compute  the  m+c  coefficients  of  Y(f)  from  the 
given  m coefficients  of  X(f)  and  the  fixed  c coefficients  of  A(f). 


Since  (PA)  Is  Identical  to  (1),  except  for  the  boundary  condition  and  the 
range,  the  same  networks  that  compute  the  FIR  filter  can  also  perform  this 
polynomial  multiplication. 

Since  (PA)  contains  a , one  more  stage  is  needed,  and  the  computation  is 
performed  such  that  y^  is  available  in  the  cycle  when  x n is  given.  In  other 
words,  the  delay  now  is  0,  instead  of  the  -1  cycle  as  we  had  before. 

Figure  (FI 2)  shows  the  network  for  this  computation.  Note  that  it  starts 
wi,h  a ^ (compared  with  flj  in  the  previous  network)  and  that  its  output  is  Y 

(compared  with  Z *Y  before).  Because  of  the  boundary  conditions  it  is 
important  to  clear  all  the  delay  units  before  starting  the  operation,  and  to 

provide  x.  = 0 for  / = m+1,  m+2 »nc.  When  these  values  are  given,  the  last  c 

values  of  Y are  obtained.  Since  there  are  true  values  of  Y,  and  only  m values 
of  X,  this  "runout"  operation  is  indeed  expected. 

The  initial  clearing  can  be  performed,  just  like  the  runout  operation,  by 
proving  the  network  with  c zero-values  for  X.  During  this  period  the 
obtained  Y values  are  Invalid. 

Obviously,  this  network  is  represented  by 

c 

v = Z Z1  H Ov  x)  (25) 

1=0 


Figure  (F 12):  Polynomial  multiplication. 
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REVERSING  THE  ORDER  OF  X 


In  several  applications  It  Is  preferred  that  xn  Is  available  before  xn  In 
these  eases  *m  Is  leading  and  xQ  trailing. 

If  this  order  Is  used,  then  the  operator  Z has  a predicting  role,  and  Z~*  Is  a 
delay.  Since  (25)  Is  Implemented  with  positive  powers  of  Z,  another 
implementation  which  uses  only  negative  powers  of  Z is  needed. 

Multiply  (25)  by  Z~c  and  get 

*-cv  ■ £ z'-c  n (»,.  x)  = £ n (•,.  x)  • £ n (»c.,  *)  ™ 

1.0  1=0  J=0 


Since  this  has  the  same  structure  as  (25)  the  same  network  can  be  used  to 
perform  this  operation,  except  for  the  following  three  conditions: 

(/)  Z 1 is  used  Instead  of  Z.  However,  since  Z meant  a delay  before,  and 
Z 1 means  a delay  now,  this  is  no  real  change  of  function,  only  of 
labeling. 


00 


The  order  of  the  a.’s  is  reversed,  because  we  have  now  a 
we  had  a.  before. 


c-J 


where 


(Hi)  The  output  now  is  Z CY  instead  of  Y,  as  before. 


This  means  that  when  *n  is  given  to  the  network,  yn+c  is  available. 
Therefore,  when  x^,  the  leading  coefficient  of  X,  is  made  available  to  the 
network,  then  ym+f,  the  leading  coefficient  of  Y,  is  computed.  The  resulting 
network  Is  shown  in  figure  (FI 3). 


J 


Figure  (FI 3):  Polynomial  multiplication  (most  significant  term  leading). 


X 
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COMPUTING  TIIE  SUM  OF  POLYNOMIAL  PRODUCTS 

Consider  the  problem  of  computing  W(0,  which  is  defined  by 

W(0  = A(0  X(f)  + B(f)  Y(U  (27) 

where  A(f)  and  B(0  are  of  degree  r,  and  X(f)  and  Y(0  are  of  degree  m. 
Obviously,  W (t)  is  of  degree  m*c. 

By  using  (26)  we  may  get 


z-cw  = zz'Jn(vj-x) + Zz'jn(bc.j-Y)  (ze> 

J=0  J=0 

This  yields,  for  c=3,  the  network  shown  in  figure  (F14).  However,  (28)  may 
also  be  written  as 


z-‘w , £z’J[n(vj-x)*n(v.(’Y)] 

J=o 

which  yields  the  combined  network  shown  in  figure  (F16). 


Figure  (Fid):  Sum  of  polynomial  products. 
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5.  DIVISION  OF  POLYNOMIALS 


TIIF  PROBLEM  OF  DIVISION  OF  POLYNOMIALS 

Polynomial  division  Is  obviously  the  Inverse  of  the  polynomial  multiplication. 

The  division  Is  defined  in  the  usual  way,  by  the  relation 

Y(0  = A(f)  X(0  {ac  / 0)  (30) 

whore  A(f)  and  Y(0  are  given  polynomials  of  degree  c and  m+c,  respectively. 

X(0,  which  is  to  be  determined,  is  a polynomial  of  degree  m. 

Division,  unlike  multiplication,  can  be  performed  only  by  starting  with  the 
most  significant  (highest  power)  of  Y.  This  nonsymmetry  is  due  to  requiring 

l only  that  the  leading  coefficient  of  A (0  must  not  be  zero. 

| 

Therefore,  we  use  (26)  and  not  (25)  in  order  to  invert  the  multiplication. 

Equation  (26)  states 

c 

7~C  Y = Z z"‘  n (vp  X)  (26) 

1=0 

Since  the  operation  has  to  be  performed  from  the  most  significant  to  the  least 

significant  term,  at  any  stage  in  the  computation  of  X(f),  the  higher  order  l 

terms  of  X(t)  must  already  be  known. 

Therefore,  we  seek  to  express  X by  using  A,  Y and  Z~*X  for  positive  values 
of  l,  but  not  including  <c(). 

Extract  Z°X  from  (26)  and  get 

z'c  y • n (v  *)  * £ z'1  n (‘c-y  *)  ‘3» 

i=l 


asMi 


I 


30 

Isolate  It  and  get 

n (*c.  *)  * z'c  y ♦ e z->  n (-vi. x)  <32> 

1=1 

In  order  to  share  the  Z c operation,  this  can  bo  transformed  into 

c 

ii  Ov x)  • i ■ 2-1  [ n x)  ♦ «,.cy  ] (33) 

i=  1 

where  = O if  i / c and  *cc  B 1. 

Since  a(  / 0,  X can  be  expressed  explicitly  by 

c 

x ■ »c"’  i z_‘  [ n (-vi’ x)  * «i.c  y ] <34> 

i=l 

The  network  for  performing  this  computation  Is  shown  in  Figure  (FI 6). 


Figure  (FI 6):  Polynomial  division,  for  c«3 
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Since  X Is  synchronized  with  Y,  Xj  is  computed  and  Is  available  at  the  same 
cycle  when  y^  is  given.  Since  the  first  coefficient  of  Y is  ym+c,  and  the  first 
coefficient  of  X is  x^,  during  the  first  c cycles  no  x.  is  output. 

Hefore  starting  this  operation  all  the  Z units  are  cleared.  Then  the  Y 
coefficients  are  given,  one  at  a time  (i.e.,  one  per  cycle).  The  first  c cycles 
are  initialization  cycles,  and  no  output  is  expected.  During  the  next  m+1 
cycles  the  coefficients  of  X,  with  xff(  leading  and  xp  trailing,  are  available. 

At  this  point  the  Z units  include  the  same  data  that  was  present  in  the  Z 
units  of  the  network  shown  in  Figure  (F13),  Just  before  the  multiplication 
process  started. 

Since  all  the  Z units  in  this  network  were  cleared  before  the  multiplication, 
all  the  Z-units  should  contain  zeroes  after  the  division.  If  they  are  discovered 
to  contain  any  nonzero  value,  then  Y(f)  was  not  a product  of  A(f)  by  any 
polynomial. 

In  fact,  the  values  in  the  c delay-units  are  the  coefficients  of  the  remainder 
polynomial,  R(f),  whose  degree  is  less  then  m.  This  polynomial  Is  defined  by 

R(0  = Y(0  - A(f)  X(f)  (35) 


CIIFCKINC  TIIF.  MULTIPLICATION  AND  THE  DIVISION 


In  order  to  check  (which  is  weaker  than  "verify")  these  operations,  we  prove 
that  if  wo  use  these  networks  first  to  perform  the  multiplication  of  any 
arbitrary  polynomial,  XU).  by  the  given  polynomial,  A(0,  and  then  to  perform 
the  division  of  this  product  by  the  same  given  polynomial,  A(f),  then  the 
same  arbitrary  polynomial  X(0  results. 

Let  Y(()  be  the  result  of  the  multiplication  of  X(f)  by  A(f),  and  let  SCO  be  the 
result  of  the  division  of  Y(0  by  A(f).  We  will  prove  that  S(0  B X(0. 
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From  (32) 


s * [ z'c  V * E *'*  n (-»c-r  s)  ] = 


substitute  (26) 


1=1 


■ »c“  [ i n (*c-f x)  - z z'‘  n (w  *)  ] 

i=0  i=l 


* »c'1  [ ac x * i z"‘  n (»c.r x)  • i z‘‘  n (“c-i' s)] 1 


i=l 


i=l 


■ X * ac-‘  I z'1  n (ae-l’  X-s)  ■ 


1=1 


v» 

. S ♦ .c->  ac  ( X-S  ) ♦ ac-*  £ Z-1  n (ac-,»  X-s)  ‘ 


1=1 


■ s ♦ »c''  £ z'1  n (»c-f  x-s) 


1=0 


Hence  £ Z"1  f]  (Vi*  X-s)  = 0 


(36) 


(37) 


1=0 


Since  the  polynomial  A(0  is  Known  not  to  be  the  zero  polynomial  because 
a(  jl  0,  the  polynomial  S(f)  must  be  equal  to  X(f).  Q.E.D. 


J 


/ 
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SIMULTANEOUS  MULTIPLICATION  AND  DIVISION  OF  POLYNOMIALS 

Define  S(0  to  bo  the  polynomial  obtained  by  multiplying  the  arbitrary 
polynomial  X(t)  by  the  given  polynomial  A(0,  and  then  by  dividing  this 
product  by  another  given  polynomial,  also  of  degree c,  such  that  b / 0. 

Dy  following  (36)  we  get 

s * "c'1  [ Z z'1 11  (»c.r  x)  ♦ Z z'“  fl  K-,.  s)  ] ■ 

i=0  i=  1 

c 

* bc‘ 1 I »C  x + Z z'1  [ n (*c-r  x)  * n s)  ] I (3e) 

1=1 


The  network  which  performs  this  computation  is  shown  in  figure  (F17). 


Figure  (FI 7)*  The  S * (A  X)  / B implementation,  for  c ■ 3. 
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6.  SYNTHETIC  APERTURE  RADAR 


The  next  example  discussed  In  this  section  is  taken  from  synthetic  aperture 
radar  (5A7?)  data  processing.  This  SdR  application  will  be  first  introduced, 
and  later  a design  for  its  implementation  will  be  discussed. 


TIIF  SAR  PROBLEM 

Consider  a moving  platform,  such  as  an  aircraft  or  a spacecraft,  travelling 
along  a straight  line.  Every  period  of  (A/7')-tiine  it  transmits  a radar  burst, 
whoso  echo  Is  recorded  N times,  T period  apart.  Typical  numbers  are  A/«fOOO 
and  T=  100  nanoseconds,  which  correspond  to  Fs=10MHz. 

Let  i be  the  serial  number  of  a given  burst,  and  let  j be  the  serial  nun.  f 
a given  echo  return  inside  it.  The  value  of  j varies  between  0 and  A/-1  he 
value  of  i starts  at  0 and  is  continuously  increased,  as  long  as  the  platform  is 
in  motion.  The  data  D(i,J)  is  recorded  at  the  time  t*(Ni*))T. 

We  use  the  notation  k *■  (i.j)  « Ni+j , which  is  very  useful  because  the  data 
is  recorded  in  a one-dimensional  serial  sequence.  We  omit  the  7 from  the 
notation.  Similarly,  the  Z operator  is  a delay  by  this  unit. 

Note  that  we  revert  to  the  original  notation,  where  the  input  D (*)  precedes  the 
input  D(A+1).  Hence,  Z is  again  the  delay  operator,  and  Z 1 is  the  p'vdictor, 
which  should  not  be  applied  to  external  input  data. 

We  refer  to  (l,*)  as  columns,  and  to  (*,J)  as  rows.  Hence,  there  are  N rows, 
which  are  parallel  to  the  platiorm  trajectory,  and  the  columns  which 
correspond  to  the  radar  bursts  are  perpendicular  to  the  trajectory. 

The  purpose  of  collecting  the  data  set  {l)(i ,j))  is  to  use  it  for  the  computation 
of  the  "surface  function"  F(i.j),  defined  by 

m 

FO.J)  = £ ak  D(i-k,J)  (37) 

k=-m 

for  the  fixed  set  of  coefficients  { afc  | -m  s k $ +m). 
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This  Is  a weighted-average  of  D (i.J)  with  its  neighbors,  of  the  same  jth  row, 
tip  to  m columns  on  each  side. 

The  definition  (37)  Is  an  extreme  simplification  of  the  actual  SAR  problem. 
For  simplicity  many  crucial  details  are  omitted.  Among  these  complex  details 
are  the  dependence  of  the  (n()  on  its  position  inside  the  burst  (its  j value) 
and  the  effects  of  the  angle  between  the  trajectory  of  the  platform  and  the 
motion  of  the  planet.  These  and  other  details  are  very  Important  for  the 
actual  SAR  process,  but  do  not  contribute  to  the  ideas  discussed  in  this  report. 


Til F I) IS  1C N OK  Til F NETWORK 

When  applying  the  Z operator  to  the  data  we  get 

7-W.))  = DO .j-l)  for  j > 0 and  ZDO.O)  = D0-1.A/-1)  (38) 

and  7.NW.j)  = DO-1  ,})  (39) 

Substitute  (39)  In  (37)  and  get 

m 

foj)  = £ n (v zkN)  i)(i*j)  (4o) 

k=-m 


ni  2m 

<>'  r ■ [ I n (v  *kN)  ] n = [ 2-”»  23  n (*i.  *kN)  ] D ■ 

k=-m  k=0 

M-l 

• [ r.-mN  Z FI  (v  zkN)  ] n (4.) 

k=0 

wliere  M*2m+ 1 and  a j is  defined  by  aj  = a^  n- 

Since  we  cannot  implement  the  Z operation,  the  best  which  we  can 

compute  Is  Z F,  which  is  F lagging  by  m(NT)  time  behind  the  input  data 
sequence  D.  This  is  to  be  expected,  since  the  definition  of  FOJ),  (37), 
requires  data  which  is  m-bursts  on  each  side  (past  and  future). 

Since  Ml)  is  very  similar  to  (1),  (9)  and  to  (21),  we  already  know  how  to 
compute  it.  liquation  (41)  can  also  be  written  as 
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zmN  f = [zn  k-  *kN)  ] d <4a) 

k=0 

which  Is  basically  like,  say  (0),  except  that  ZN  Is  used  here  and  Z Is  used 
there.  Hence  following  the  third  section  (the  FIR  filter  example)  we  get  the 
fastest  Implementation  represented  by 

M-l 

zn,N  F . £ ZKN  J|  (a.t  (43, 

k=0 

It  is  left  for  the  interested  reader  to  check  this  design  against  the  design 
objectives,  (a)  through  (r). 

As  mentioned  in  an  earlier  section,  the  ZN  operators  in  (43)  are  more 
expensive  than  those  of  (43),  because  they  store  products,  which  usually 
(especially  in  fixed-point  arithmetic  implementations)  have  more  bits  of 
information  than  the  raw  data  signal,  D. 

The  reason  for  moving  the  Z operators  from  the  raw  data  to  the  partial  sum 
of  the  products,  where  it  is  more  expensive,  is  to  sep  -ate  the  adders  in  order 
to  avoid  the  long  carry-chain  propagation,  in  order  to  improve  the 
computation  rate. 


However,  this  separation  can  be  achieved  by  a single  Z,  for  any  value  of  N . 
therefore,  in  order  to  achieve  the  improved  computation  rate,  without 
overpaying"  in  parts,  the  following  implementation  can  be  used: 


M-l 


mN 


'•  ■ [ Z Zk  II  k.  Zk<N-")  ] D 


k = 0 


(44) 


Note  that  the  three  occurrences  of  Z in  (44)  correspond  to  three  different 
meanings:  the  first,  on  the  left-hand  side,  represents  the  delay  in  the 
computation  of  f (relative  to  1>)  and  does  not  represent  any  device.  The 
second  Z,  in  Z , represents  the  registers  used  for  holding  partial  sums  of 

products,  and  the  third,  in  Z^  ^ \ represents  the  (AZ-l)-stage  shift  register 
used  for  delaying  the  input  signal,  D. 

figures  (fl8),  (F19)  and  (F20)  show  the  implementations,  for  m=2,  of  (42), 
(43)  and  (44),  respectively. 
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7.  SUMMARY  AND  CONCLUSIONS 


Wo  have  shown  that  the  mathematical  notation  commonly  used  for  the 
specif ication  of  a computation  may  implicitly  surest  some  design  features 
that  are  not  necessarily  desired. 

We  suggest  that  the  mathematical  definition  be  transformed  into  the 
computational  network  representation  notation,  which  can  be  evaluated 
according  to  the  important  design  objectives. 

furthermore,  this  representation  can  be  transformed  symbolically,  as  opposed 
to  graphically,  in  order  to  generate  alternative  networks,  which  should  also 
be  evaluated  according  to  the  design  objectives. 

Those  transformations  should  continue  until  no  further  Improvement  is 
achieved. 

furthermore,  we  suggest  that  it  is  feasible  to  implement  an  automatic  system 
for  performing  these  symbolic  transformations  and  evaluations,  and  highly 
recommend  it. 


